Skip to content

Conversation

@YuriNachos
Copy link

Summary

  • Strip markdown code block markers from LLM responses before JSON parsing
  • Fixes JSONDecodeError when LLMs return JSON wrapped in ```json...```

Fixes

Fixes #1663

Details

Claude Sonnet and other LLMs sometimes return valid JSON wrapped in markdown
code blocks, even when JSON mode is enabled. This causes json.loads() to fail
with:

```
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

The fix pre-processes the LLM response to remove common markdown patterns:

  • ```json\n...\n```
  • ```\n...\n```

Before attempting JSON parsing in JsonCssExtractionStrategy.generate_schema().

Test plan

  • Code review confirms the fix handles common markdown wrapping patterns
  • Preserves valid JSON responses without markdown
  • Maintains backward compatibility with existing behavior

🤖 Generated with Claude Code

Fixes unclecode#1663

Claude Sonnet and other LLMs sometimes return valid JSON wrapped in
markdown code blocks (\`\`\`json...\`\`\`), causing JSONDecodeError
in JsonCssExtractionStrategy.generate_schema().

Added pre-processing to strip markdown code block markers before JSON
parsing, handling both \`\`\`json and \`\`\` formats.

Co-Authored-By: Claude <noreply@anthropic.com>
@YuriNachos YuriNachos force-pushed the fix/issue-1663-llm-json-markdown branch from cc5ffd3 to 8d619f5 Compare January 17, 2026 11:15
@unclecode
Copy link
Owner

Thanks for the fix! This issue has already been addressed on the develop branch — there's now a _strip_markdown_fences() utility that handles this case in agenerate_schema(). Closing as already resolved, but appreciate you spotting it.

@unclecode unclecode closed this Feb 1, 2026
unclecode added a commit that referenced this pull request Feb 1, 2026
- PR #1714: Replace tf-playwright-stealth with playwright-stealth
- PR #1721: Respect <base> tag in html2text for relative links
- PR #1719: Include GoogleSearchCrawler script.js in package data
- PR #1717: Allow local embeddings by removing OpenAI fallback
- Fix: Extract <base href> from raw HTML before head gets stripped
- Close duplicates: #1703, #1698, #1697, #1710, #1720
- Update CONTRIBUTORS.md and PR-TODOLIST.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Claude Sonnet returns markdown-wrapped JSON despite JSON mode being enabled in generate_schema

2 participants